The data set includes observations of electrical energy consumption registred for every day of the month during 7 years (2010-2017).
it includes:
Date=paste(D$annee,D$mois,D$jour,sep = '-')
D$Date=as.Date(Date)
Variab<- ts(D$Energie_trans,start = c(2010, as.numeric(format(D$Date[1], "%j"))),
frequency = 365)
don <- xts(x =Variab, order.by = D$Date)
dygraph(don) %>%
dyOptions(labelsUTC = TRUE, fillGraph=TRUE, fillAlpha=0.1, drawGrid = FALSE, colors="#D8AE5A") %>%
dyRangeSelector() %>%
dyCrosshair(direction = "vertical") %>%
dyHighlight(highlightCircleSize = 5, highlightSeriesBackgroundAlpha = 0.2, hideOnMouseOut = FALSE) %>%
dyRoller(rollPeriod = 1)It can be seen that the consumption of electrical energy has an ascending trend.
From the plots above we can recongnize an ascending trend and a stationary remainder (noise).
As the time series increases in magnitude, the seasonal variation increases as well. It’s why we should use a multiplicative model.
Tmoy, tmin and Tmax are considered explicative variables, the choice of which one to include in the model is based on its correlation with the energy consumption. Statistically, we can’t include two correlated explicative variables in the same model.
ggcorrplot(cor(D[,c(8,5:7)]),
outline.col = "white",
lab = TRUE,
lab_size = 5,
lab_col = '#736F6E',
ggtheme = ggplot2::theme_gray,
colors = c('#595959', "white", "#6D9EC1"))In the following, tmin is considered the optimal variable to use among tmin, Tmax and Tmoy with a correlation with Energy equal to 0.65.
For the remaining variables (time related: year, JF, Ramadhan), this correlogram indicates which variables can be most related to the increase in Electrical energy consumption based on the correlation value.
ggcorrplot(cor(D[,c(8,1,9:10)]),
outline.col = "white",
lab = TRUE,
lab_size = 5,
lab_col = '#736F6E',
ggtheme = ggplot2::theme_gray,
colors = c('#595959', "white", "#6D9EC1"))We can see that the consumption of Energy is correlated with both the year and whether or not it’s a ramdhan day.
ggcorrplot(cor(D[,c(8,11:16)]),
outline.col = "white",
lab = TRUE,
lab_size = 5,
lab_col = '#736F6E',
ggtheme = ggplot2::theme_gray,
colors = c('#595959', "white", "#6D9EC1"))Electrical energy consumption is negatively correlated with the months with low temperature (January, February…) and positively correlated with months with high temperature (July, August, September…) which indicates that in Tunisia the consumption of electrical energy increases only in the summer due to the higher temperature that lead to the overuse of air conditioner.
ggcorrplot(cor(D[,c(8,17:22)]),
outline.col = "white",
lab = TRUE,
lab_size = 5,
lab_col = '#736F6E',
ggtheme = ggplot2::theme_gray,
colors = c('#595959', "white", "#6D9EC1"))ggcorrplot(cor(D[,c(8,23:29)]),
outline.col = "white",
lab = TRUE,
lab_size = 5,
lab_col = '#736F6E',
ggtheme = ggplot2::theme_gray,
colors = c('#595959', "white", "#6D9EC1"))Electrical energy consumption is negatively correlated with the days of the weekend (Saturday and sunday) and positively correlated with the rest of days of the week.
The consumption of electrical energy descreases in the weekends probably because it’s a holiday and there’s no energy consumption in all the fields of work.
tslm function is used to explicate the consumption of electrical energy using various explicative variables:
train <- D[1:2557, ] #excluding the last year to predict energy consumption later
test <- D[2558:2922, ]
train=ts(train,start = c(2010, as.numeric(format(train$Date[1], "%j"))),frequency = 365)
fit <- tslm(Energie_trans ~annee+tmin+JF+Ramadhan+Janvier+
Fevrier+Mars+Avril+Mai+Juin+Juillet+Aout+Septembre+Octobre+Novembre+Lundi+Mardi+Mercredi+Jeudi+Vendredi+Samedi, data = train)
summary(fit)##
## Call:
## tslm(formula = Energie_trans ~ annee + tmin + JF + Ramadhan +
## Janvier + Fevrier + Mars + Avril + Mai + Juin + Juillet +
## Aout + Septembre + Octobre + Novembre + Lundi + Mardi + Mercredi +
## Jeudi + Vendredi + Samedi, data = train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -23000.4 -2241.9 -74.7 2079.2 18662.4
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.229e+06 8.535e+04 -49.552 < 2e-16 ***
## annee 2.128e+03 4.240e+01 50.195 < 2e-16 ***
## tmin 3.812e+02 3.157e+01 12.073 < 2e-16 ***
## JF -5.907e+03 4.391e+02 -13.451 < 2e-16 ***
## Ramadhan 6.790e+02 3.610e+02 1.881 0.0601 .
## Janvier -8.135e+02 4.133e+02 -1.968 0.0492 *
## Fevrier -8.642e+01 4.245e+02 -0.204 0.8387
## Mars -3.011e+03 4.118e+02 -7.312 3.52e-13 ***
## Avril -4.693e+03 4.248e+02 -11.046 < 2e-16 ***
## Mai -2.455e+03 4.419e+02 -5.556 3.04e-08 ***
## Juin 4.258e+03 5.000e+02 8.516 < 2e-16 ***
## Juillet 1.396e+04 5.741e+02 24.321 < 2e-16 ***
## Aout 1.437e+04 5.823e+02 24.681 < 2e-16 ***
## Septembre 5.096e+03 5.422e+02 9.400 < 2e-16 ***
## Octobre -2.153e+03 4.848e+02 -4.442 9.31e-06 ***
## Novembre -4.499e+03 4.318e+02 -10.419 < 2e-16 ***
## Lundi 6.858e+03 3.171e+02 21.627 < 2e-16 ***
## Mardi 8.192e+03 3.171e+02 25.838 < 2e-16 ***
## Mercredi 8.340e+03 3.171e+02 26.306 < 2e-16 ***
## Jeudi 8.438e+03 3.171e+02 26.614 < 2e-16 ***
## Vendredi 8.226e+03 3.168e+02 25.963 < 2e-16 ***
## Samedi 5.158e+03 3.171e+02 16.266 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4286 on 2535 degrees of freedom
## Multiple R-squared: 0.833, Adjusted R-squared: 0.8317
## F-statistic: 602.3 on 21 and 2535 DF, p-value: < 2.2e-16
## ME RMSE MAE MPE MAPE MASE
## Training set 1.615193e-13 4267.356 3046.05 -0.3339313 4.319366 0.5606067
## ACF1
## Training set 0.7842202
This model reflects 83.27% of the reality (Adjusted R-squared=0.8317).
The model previously constructed will be used to make a forecast of one year of electrical energy in Tunisia.
fore<-forecast(fit,test)
p=plot_ly() %>%
add_lines(x = (D)$Date , y = (D)$Energie_trans,
color = I('#595959') , name = "observed") %>%
add_ribbons(x = (test)$Date, ymin = fore$lower[, 2], ymax = fore$upper[, 2],
color = I("#b3b3ff"), name = "95% CI") %>%
add_lines(x = (test)$Date, y = fore$mean, color = I("#0073e6"), name = "prediction")
pThe forecasted values of consumed electrical energy is accurate as we can observe through the plot.
Electrical energy consumption will continue to have an increasing trend through time.
Cleveland, R. B., Cleveland, W. S., McRae, J. E., & Terpenning, I. J. (1990). STL: A seasonal-trend decomposition procedure based on loess. Journal of Official Statistics, 6(1), 3–33. http://bit.ly/stl1990
Rob J Hyndman and George Athanasopoulos, Monash University, Australia. Forecasting: Principles and Practice. https://otexts.com/fpp2/
Yan Holtz.the R graph gallery. https://www.r-graph-gallery.com/index.html